Audio sparse decompositions in parallel Let the greed be shared !
نویسنده
چکیده
Greedy methods are often the only practical way of solving very large sparse approximation problems. Among such methods, Matching Pursuit (MP) is undoubtedly one of the most widely used, due to its simplicity and relatively low overhead. Since MP works sequentially, however, it is not straightforward to formulate it as a parallel algorithm, to take advantage of multi-core platforms for real-time processing. In this paper, we investigate how a slight modification of MP makes it possible to break down the decomposition into multiple local tasks, while avoiding blocking effects. Our simulations on audio signals indicate that this Parallel Local Matching Pursuit (PLoMP) gives results comparable to the original MP algorithm, but could potentially run in a fraction of the time — on-the-fly sparse approximations of high-dimensional signals should soon become a reality. The last two decades have witnessed the advent of sparsity as a major paradigm in many areas of signal processing. Sparsity is the key to success for most of state-of-the-art multimedia compression schemes, such as still image coding (for instance JPEG-2000 [1]), and audio coding (MPEG-2/4 Advanced Audio Coding (AAC) [2]). Basically, sparsity exploits the fact that there exist bases in which, for most natural signals, only a few of the transform coefficients are sufficient to provide a good approximation. To be more precise, given a signal, by sorting its transform coefficients by absolute decaying order, one observes a fast decay, typically a power law with some large negative exponent. This ability to concentrate most of the energy of the signals into only a few of the transform coefficients naturally leads to an increased coding efficiency. For instance, in the JPEG-2000 image coder based on an orthogonal 2-D dyadic wavelet transform, only portions of the image that correspond to sharp transitions (at objects’ edges for instance) will lead to large wavelet coefficients : most of the bit budget is spent in these regions. Similarly, in the Modified Discrete Cosine Transform (MDCT) domain, i.e. the cosine-based filterbank of AAC, the large coefficients represent the perceptually dominant sinusoidal harmonics of the musical content. With a smart quantization of these few large transform coefficients, and an efficient indexing of their parameters, these coders achieve a high compression ratio at virtually no loss of perceptual quality (typically at 1:20 or more for JPEG-2000, and 1:6 for MPEG-2/4 AAC). But why does it work – where does this energy compaction come from? This is basically due to the fact that the transform basis elements “look like” elementary components of the Laurent Daudet is Professor at the Université Paris Diderot Paris 7, Institut Langevin “Waves and Images” (LOA), 10 rue Vauquelin, 75005 Paris, France. This research was supported by the French GIP ANR under contract ANR-06-JCJC-0027-01 “DESAM”. Part of this research was done during an Audio research residency at the Banff Centre, Canada. Email: [email protected] analyzed signals: 2-D wavelets look like the edges of objects in images, discrete cosines look like the harmonics of musical notes. Only a few of these elementary building blocks are thus sufficient to well approximate the signals. It should be emphasized that the corresponding algorithms have a relatively low complexity: in the orthonormal bases described above (discrete wavelets / MDCT), selecting the set of significant coefficients involves a simple thresholding. However, with all these nice properties also comes a major flaw: orthogonal bases are usually too rigid to accommodate even basic invariance properties of our signals. For instance, standard wavelet image codecs do not have shiftnor rotationinvariance: if the object pictured is slightly moved and/or tilted then its transform representation is fundamentally different. Similarly, in the audio domain, the MDCT is not shiftinvariant: depending on the exact position of the signal with respect to the analysis frames, the transform coefficients may be radically different and so is the compression efficiency. Furthermore, the single frame length of the MDCT is inappropriate to simultaneously represent both the very sharp attack transients at the onset of percussive notes (where very short windows are desirable), and the long harmonics of tones (where a high frequency resolution is needed, hence long frame sizes). To achieve higher sparsity, the key is to use decomposition spaces that have more basis vectors than orthonormal bases, and thereby more flexibility. These extended bases are called overcomplete, or redundant bases. Would you like timeshift invariance in your audio transform? The discrete Gabor Transform, as known for 1-D signals as Short-Time Fourier Transform, is nearly shift-invariant at the cost of (at least) doubling the size of the basis. Would you like shift-invariance in your image coder? The dual-tree complex wavelet [3] offers you this (approximately), but it is now 4-times overcomplete. With such overcomplete bases, sparsity is improved: basically, the larger the basis (the more redundant) the more likely it is that, for every local feature of the signal, there will be one basis vector that nearly fits. Overcompleteness brings flexibility and generality in the class of signals that are well, sparsely, represented. Recently, prototype codecs have been developed in many fields of multimedia, for example image [4], audio [5], [6] or video [7]. At very low bitrates (i.e., very high compression ratio), these new codecs outperform standard codecs based on orthogonal transforms. Besides coding, there are also many applications that benefit from this sparse energy compaction property [8], for instance information extraction [9], [10], source localization [11] or source separation [12], [13]. So why are these sparse overcomplete transforms not used in
منابع مشابه
Solving System of Linear Congruence Equations over some Rings by Decompositions of Modules
In this paper, we deal with solving systems of linear congruences over commutative CF-rings. More precisely, let R be a CF-ring (every finitely generated direct sum of cyclic R-modules has a canonical form) and let I_1,..., I_n be n ideals of R. We introduce congruence matrices theory techniques and exploit its application to solve the above system. Further, we investigate the application of co...
متن کاملAutomatic Parallel Program Generation and Optimization from Data Decompositions
Data decomposition is probably the most successful method for generating parallel programs. In this paper a general framework is described for the automatic generation of parallel programs based on a separately specified decomposition of the data. To this purpose, programs and data decompositions are expressed in a calculus, called Vcal. It is shown that by rewriting calculus expressions, Singl...
متن کاملSecond dual space of little $alpha$-Lipschitz vector-valued operator algebras
Let $(X,d)$ be an infinite compact metric space, let $(B,parallel . parallel)$ be a unital Banach space, and take $alpha in (0,1).$ In this work, at first we define the big and little $alpha$-Lipschitz vector-valued (B-valued) operator algebras, and consider the little $alpha$-lipschitz $B$-valued operator algebra, $lip_{alpha}(X,B)$. Then we characterize its second dual space.
متن کاملParallel Approaches for Intervals Analysis of Variable Statistics in Large and Sparse Linear Equations with RHS Ranges
This study proposes an algorithm capable of working in parallel for solving large and sparse linear equations under given right hand side (RHS) ranges. A comparative study to the direct linear programming method is reported theoretically, computationally and discussed. Moreover, the approach can be adapted for the system under domain decompositions structure leading to a better efficiency exper...
متن کاملModeling 1D Distributed-Memory Dense Kernels for an Asynchronous Multifrontal Sparse Solver
To solve sparse linear systems multifrontal methods rely on dense partial LU decompositions of so-called frontal matrices; we consider a parallel, asynchronous setting in which several frontal matrices can be factored simultaneously. In this context, to address performance and scalability issues of acyclic pipelined asynchronous factorization kernels, we study models to revisit properties of le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010